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Abstract — New lower bounds on the total variation distance 
between the distribution of a sum of independent Bernoulli ran- 
dom variables and the Poisson random variable (with the same 
mean) are derived via the Chen-Stein method. Corresponding 
lower bounds on the relative entropy are derived, based on the 
lower bounds on the total variation distance and an existing 
distribution-dependent refinement of Pinsker's inequality. Two 
uses of these bounds are finally outlined. The full version for this 
shortened paper is available at http://arxiv.org/abs/1206.6811 

Index Terms — Chen-Stein method, Poisson approximation, rel- 
ative entropy, total variation distance. 

I. Introduction 

Convergence to the Poisson distribution, for the number of 
occurrences of possibly dependent events, naturally arises in 
various applications. Following the work of Poisson, there has 
been considerable interest in how well the Poisson distribution 
approximates the binomial distribution. 

The basic idea which serves as a starting point of the 
so called Chen-Stein method for the Poisson approximation 
is the following. Let {Xi}™ =1 be independent Bernoulli 
random variables with E(X{) = pi. Let W = Y^i=i -^i> 
Vi = Ylj^tiXj for every i £ {1, ...,n}, and Z ~ Po(A) 

with mean A = X)"=iP«- ^ * s ratner eas Y to show that the 
equality 

E[Xf(Z + l)-Zf(Z)]=0 (1) 

holds for an arbitrary bounded function / : No —> R. 
Furthermore, one can show that (see, e.g., [16. Chapter 2]) 



E[Xf(W + l)-Wf(W)] 
y j E[f(V j +2)-f(V j 



1)] 



(2) 



which then serves to get rigorous bounds on the difference 
between the distributions of W and Z, by the Chen-Stein 
method for Poisson approximations. This method, and more 
generally the so called Stein's method, serves as a powerful 
tool for the derivation of rigorous bounds for various distri- 
butional approximations. Nice expositions of this method are 
provided, e.g., in J2], |[T6l Chapter 2] and 11171 . Furthermore, 
some interesting links between the Chen-Stein method and 
information-theoretic functionals in the context of Poisson and 
compound Poisson approximations are provided in |]6]. 

Throughout this paper, we use the term 'distribution' to refer 
to the discrete probability mass function of an integer-valued 



random variable. In the following, we introduce some known 
results that are related to the presentation of the new results. 

Definition 1: Let P and Q be two probability measures 
defined on a set X. Then, the total variation distance between 
P and Q is defined by 

d TV (P,Q)= sup \P(A)-Q(A)\ (3) 

Borel ACX 

where the supermum is taken w.r.t. all the Borel subsets A of 
X. If A' is a countable set then (f3]i is simplified to 



<hv(P,Q) 



Q(x)\ 



\P~Q\h 



(4) 



so the total variation distance is equal to one-half of the L\- 
distance between the two probability distributions. 

Among old and interesting results that are related to the 
Poisson approximation, Le Cam's inequality [14] provides 
an upper bound on the total variation distance between the 
distribution of the sum W = J27=i -^i °f n independent 
Bernoulli random variables {Xi}f =1 , where Xj ~ Bern(pi), 
and a Poisson distribution Po(A) with mean A = Y^7=i Pi- 
This inequality states that dxv (Pw >P°W) < Y^=iPi so ^ 
e.g., Xi ~ Bern(^) for every i £ {1, . . . , n} (referring to the 
case that W is binomially distributed) then this upper bound 
is equal to thus decaying to zero as n tends to infinity. 
The following theorem combines [4, Theorems 1 and 2], and 
its proof relies on the Chen-Stein method: 

Theorem 1: Let W — X)"=i ^» ^ e a sum °f n inde- 
pendent Bernoulli random variables with E(Xj) = pi for 
i € {l,...,n}, and E(W^) = A. Then, the total variation 
distance between the probability distribution of W and the 
Poisson distribution with mean A satisfies 



1 1 " 

32 i 1 A a) & 2 ^ rf T v(iV,Po(A)) < 



1 - e 



-A 



A 



Erf 

i=l 



(5) 

where a A b = min{a, b} for every a, b £ R. 

As a consequence of Theorem Q] it follows that the ratio 
between the upper and lower bounds in (0 is not larger 
than 32, irrespectively of the values of {pi}. The factor 
in the lower bound was claimed to be improvable to i with 
no explicit proof (see ||5] Remark 3.2.2]). This shows that, 
for independent Bernoulli random variables, these bounds are 
essentially tight. Furthermore, note that the upper bound in 



(0 improves Le Cam's inequality; for large values of A, this 
improvement is by approximately a factor of i . 

This paper presents new lower bounds on the total variation 
distance between the distribution of a sum of independent 
Bernoulli random variables and the Poisson random variable 
(with the same mean). The starting point of the derivation of 
these new bounds generalizes and improves the analysis by 
Barbour and Hall [4|, based on the Chen-Stein method for the 
Poisson approximation. A new lower bound on the relative 
entropy between these two distributions is introduced, and 
this lower bound is compared to a previously reported upper 
bound on the relative entropy by Kontoyiannis et al. fl3l . The 
derivation of the new lower bound on the relative entropy 
follows from the new lower bound on the total variation 
distance, combined with a distribution-dependent refinement 
of Pinsker's inequality by Ordentlich and Weinberger [15 1. 
We finally conclude the discussion by outlining two possible 
uses of these new lower bounds (which are discussed in more 
detail in IT8l and |19|). The presentation in this conference 
paper relies on the analysis in the full paper version [ 1 8 1 . 

II. Improved Lower Bounds on the Total Variation 
Distance 

In the following, we introduce an improved lower bound 
on the total variation distance and then provide a loosened 
version of this bound that is expressed in closed form. 

Theorem 2: In the setting of Theorem Q] the total variation 
distance between the probability distribution of W and the 
Poisson distribution with mean A satisfies the inequality 



#l(A) <d T y(P W ,Po(X)) 



< 



1 - e 



where 



#i(A) 4 

and 

h\(ai,a 2 ,6) 



sup 

cm, «2 el 
a 2 < \ + 
9>0 



1 - h\(a 1 ,a 2 ,9 
2g\(ai, a2,0) 



(7) 



3A + (2 - a 2 + A) 3 - (1 - a 2 + A) 3 



ex 



\a x -a 2 \ (2A+|3-2a 2 |) exp 

+ ex 

x + = max{i, 0}, x 2 + = (x + ) 2 , Vx € 
gx(on,OL 2 ,6) 



(i-t» 2 ) 2 + 
ex 



(8) 
(9) 



max ■ 



2(- 



1 + \j ^ ■ \oti - ol 2 \ 1 A + max{x(ui)} 



+ J ■ \a\ - a 2 \ j A - min{a;(w l )} 



x(u) = (cq + c\u + c 2 u 2 ) exp(— u 2 ), Vu G 



(10) 



(11) 



{ui} = ju e K : 2c 2 u 3 + 2ciu 2 - 2(c 2 - c Q )u - ci = j 

(12) 

co = (a 2 - "i)(A - a 2 ) (13) 
ci = Ve~X{X + ai -2a 2 ) (14) 
c 2 4 -ex. (15) 



Proof: See 11181 Section 4.F]. The derivation relies on 
the Chen-Stein method for the Poisson approximation, and it 
improves (significantly) the constant in the lower bound by 
Barbour and Hall [4, Theorem 2]. The proof in lfl8l is self- 
contained. ■ 

Remark 1: The upper and lower bounds on the total varia- 
tion distance in (0 scale like 2~27=i Pi> similarly to the known 
bounds in TheoremQ] The ratio of the upper and lower bounds 
in Theorem Q] tends to 32.00 when either A tends to zero or 
infinity. It was obtained numerically that the ratio of the upper 
bound and new lower bound on the total variation distance is 
reduced to 1.69 when A — > 0, it is 10.54 when A — > oo, and 
it is no more than 12.91 for all A € (0,oo). 

Remark 2: ||9] Theorem 1.2] provides an asymptotic result 
for the total variation distance between the distribution of 
the sum W of n independent Bernoulli random variables 
with E(-Xj) = pi and the Poisson distribution with mean 

A = 2~Z"=iPi- It snows mat when 2^7=1 Pi 00 an( l 
maxi<i<„ pi — > as n — > oo then 



d TV (iV,Po(A)) 




(16) 



i=i 



This implies that the ratio of the upper bound on the total 
variation distance in (|4] Theorem 1] (see Theorems Q] here) 
and this asymptotic expression is equal to \/2ne « 4.133. 
Therefore, in light of the previous remark (see Remark [T|, it 
follows that the ratio between the exact asymptotic value in 
(TToT l and the new lower bound in (O is equal to 1 °' 54 w 2.55. 
It therefore follows from Remark [1] that in the limit where 
A — > 0, the new lower bound on the total variation in © is 
smaller than the exact value by no more than 1.69, and for 
A ^ 1, it is smaller than the exact asymptotic result by a 
factor of 2.55. 

Remark 3: The cardinality of the set {u^ in ( fT2l can be 
shown to be 3 (see [18, Section 4]). 

Remark 4: The optimization that is required for the compu- 
tation of K\ in (0 w.r.t. the three parameters a\, a 2 € K and 
e E M. + is performed numerically. The numerical procedure 
for the computation of K\ is presented in 1181 Section 4]. 

In the following, we introduce a looser lower bound on the 
total variation distance as compared to the lower bound in 
Theorem |2] but its advantage is that it is expressed in closed- 
form. Both lower bounds improve (significantly) the lower 
bound in [4 Theorem 2]. The following lower bound follows 
from Theorem|2]by the special choice of u\ = a 2 = X that is 
included in the optimization set for Ki on the right-hand side 
of (0. Following this sub-optimal choice, the lower bound in 



the next corollary is obtained by a derivation of a closed-form 
expression for the third free parameter 9 € M + (in fact, this 
was our first step towards the derivation of an improved lower 
bound on the total variation distance). 

Corollary 1: Under the assumptions in Theorem [2] then 

n / 1 — A v 

Ki(A) Y,Pi < drv(iV,Po(A)) < 



A 



E 



V, 



where 



1(3 + 1) 



8 4 3 + I + ~ • v /(3A + 7)[(3 + 2e-V2)A + 7], 



(17) 
(18) 
(19) 



Proq/:- See HI Section 4]. ■ 
Remark 5: The lower bound on the total variation distance 
on the left-hand side of (TTTb improves uniformly the lower 
bound in |4, Theorem 2] (i.e., the left-hand side of Eq. (O 
here). This improvement is shown in Fig. Q] 




Fig. 1. The figure presents curves that correspond to ratios of upper and 
lower bounds on the total variation distance between the sum of independent 
Bernoulli random variables and the Poisson distribution with the same mean 
A. The upper bound on the total variation distance for all these three curves is 
the bound by Barbour and Hall (see [4 Theorem 1] or Theorem [T]here). The 
lower bounds that the three curves refer to them are the following: the curve 
at the bottom (i.e., the one which provides the lowest ratio for a fixed A) is 
the improved lower bound on the total variation distance that is introduced in 
Theorem [2] The curve slightly above it for small values of A corresponds to 
looser lower bound when a\ and 02 in are set to be equal (i.e., a\ = 
c*2 = 0. is their common value), so that the optimization of K\ for this curve 
is reduced to be a two-parameter maximization of K\ over the two free 
parameters a G K and 9 £ K + . Finally, the curve at the top of this figure 
corresponds to the further loosening of this lower bound where a is set to 
be equal to A; this leads to a single-parameter maximization of K\ (over the 
parameter 9 £ M+) whose optimization leads to the closed-form expression 
of the lower bound in Corollary [T] For comparison, in order to assess the 
enhanced tightness of the new lower bounds, note that the ratio of the upper 
and lower bounds on the total variation distance from [4: Theorems 1 and 2] 
(or Theorem \T\ here) is roughly equal to 32 for all values of A. 



III. Improved Lower Bounds on the Relative 
Entropy 

The following theorem relies on the new lower bound on 
the total variation distance in Theorem and the distribution- 
dependent refinement of Pinsker's inequality in lfT31 . Their 
combination serves to derive a new lower bound on the relative 
entropy between the distribution of a sum of independent 
Bernoulli random variables and a Poisson distribution with 
the same mean. The following upper bound on the relative 
entropy was introduced in [13 Theorem 1]. Together with 
the new lower bound on the relative entropy, it leads to the 
following statement: 

Theorem 3: In the setting of TheoremQ] the relative entropy 
between the probability distribution of W and the Poisson 
distribution with mean A = E(VK) satisfies the inequality: 



if 2 (A) 



where 




< D P, 



(20) 



K 2 (X)^m(X)(K 1 (X)y (21) 
with K\ from Q, and 

1o s(?^t) if AS (0, log 2) 



m(A) 



if A > loa- 2. 



(22) 



Proof: See Ql Section 4]. ■ 
Remark 6: The combination of the original lower bound on 
the total variation distance from [4. Theorem 2] (see (0) with 
Pinsker's inequality gives the following lower bound on the 
relative entropy: 



£>(iV||Po(A)) 



> 



1 

512 




(23) 



In light of RemarkQ] it is possible to quantify the improvement 
that is obtained by the new lower bound of Theorem|3]in com- 
parison to the looser lower bound in (l23~t . The improvement 
of the new lower bound on the relative entropy is by a factor 
of 179.7 log(j) for A 0, a factor of 9.22 for A -» 00, 
and at least by a factor of 6.14 for all A € (0, 00). The above 
conclusions are supported by Figure |2]that refers to the special 
case of the relative entropy between the binomial and Poisson 
distributions. 

Remark 7: In IflOl Example 6], it is shown that if ¥,(X) < A 
then D(P X || Po(A)) > ^ (E(X) - A) 2 . Since E(W) = A 
then this lower bound on the relative entropy is not informative 
for the relative entropy D[P W ||Po(A)) where E(W) = A. 
Theorem [3] and the loosened bound in (l23l are, on the other 
hand, informative in the studied case. 

We were notified in Ifl2l about the existence of an- 
other recently derived lower bound on the relative entropy 
D{Px 1 1 Po(A)) in terms of the variance of a random vari- 
able X with values in No (this lower bound appears in a 




icr 3 io~ ! X n' io° 



Fig. 2. This figure refers to the relative entropy between the bino- 
mial and Poisson distributions with the same mean A. The horizontal 
axis refers to A, and the vertical axis refers to a scaled relative entropy 
n 2 £>(Bin(n,£)||Po(A)) (£?=i*i ~ Bin(n, |) when X, ~ Bern( Pi ) 
with pi = ^ is fixed for all i € {1, . . . , n}). This scaling of the relative en- 
tropy is supported by the upper bound on the relative entropy by Kontoyiannis 

et al. (see Q3] Theorem 1]) that is equal to i Yh=i T=F = ^ +°( J r)- 11 
is also supported by the new lower bounds in Theorems '|5]and n Eq. (T5$ since 
the common term in these lower bounds is equal to (J^Lj P 2 ) 2 = so 
a multiplication of these lower bounds on the relative entropy by n 2 gives an 
expression that only depends on A. It follows from [11. Theorem 1] (see also 
(2 p. 2302]) that D(Bin(n, £)||Po(A)) = ^r + 0(4r) (so, the exact value 
is asymptotically equal to one-quarter of the upper bound). This figure shows 
the upper and lower bounds, as well as the exact asymptotic result, in order to 
study the tightness of the existing upper bound and the new lower bounds. By 
comparing the dotted and dashed lines, this figure also shows the significant 
impact of the refinement of the lower bound on the total variation distance by 
Barbour and Hall (see (4| Theorem 2]) on the improved lower bound on the 
relative entropy (the former improvement is squared via Pinsker's inequality 
or its refinement). Furthermore, by comparing the dotted and solid lines of 
this figure, it shows that the probability-dependent refinement of Pinsker's 
inequality, applied to the Poisson distribution, affects the lower bound for 
A < log(2). 

currently un-published work). The two bounds were derived 
independently, based on different approaches. In the setting 
where X = Yn=i is a sum of independent Bernoulli 
random variables with EpQ) = pt and A = E(X) = 

J27—i Pi^ me two lower bounds on the relative entropy scale 
like (X)"=i Pi) t> ut with a different scaling factor. 

IV. Examples for the Use of the New Bounds 

We conclude our discussion in this paper by outlining two 
uses of the new lower bounds in this work: 

The use of the new lower bound on the total variation 
distance for the Poisson approximation of a sum of indepen- 
dent Bernoulli random variables is exemplified in [19|. The 
latter work introduces new entropy bounds for discrete random 
variables via maximal coupling, providing bounds on the dif- 
ference between the entropies of two discrete random variables 
in terms of the local and total variation distances between 



their probability mass functions. The new lower bound on the 
total variation distance for the Poisson approximation from 
this work was involved in the calculation of some improved 
bounds on the difference between the entropy of a sum of 
independent Bernoulli random variables and the entropy of 
a Poisson random variable of the same mean. A possible 
application of the latter problem is related to getting bounds 
on the sum-rate capacity of a noiseless i-T-user multiple-access 
channel with binary inputs. For more details, the reader is 
referred to HI] Section 4]. 

The use of the new lower bound on the relative entropy 
for the Poisson approximation of a sum of Bernoulli random 
variables is exemplified in ifTHl Section 4.E] in the context of 
binary hypothesis testing. The impact of the improvement of 
the lower bound on the relative entropy is also exemplified 
numerically in this context. 
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